Linguistic Indicators for Quality Estimation of Machine Translations

نویسندگان

  • Mariano Felice
  • Lucia Specia
  • Xavier Blanco
چکیده

This work presents a study of linguistically-informed features for the automatic quality estimation of machine translations. In particular, we address the problem of estimating quality when no reference translations are available, as this is the most common case in real world situations. Unlike previous attempts that make use of internal information from translation systems or rely on purely shallow aspects, our approach uses features derived from the source and target text as well as additional linguistic resources, such as parsers and monolingual corpora. We built several models using a supervised regression algorithm and different combinations of features, contrasting purely shallow, linguistic and hybrid sets. Evaluation of our linguistically-enriched models yields mixed results. On the one hand, all our hybrid sets beat a shallow baseline in terms of Mean Average Error but on the other hand, purely linguistic feature sets are unable to outperform shallow features. However, a detailed analysis of individual feature performance and optimal sets obtained from feature selection reveals that shallow and linguistic features are in fact complementary and must be carefully combined to achieve optimal results. In effect, we demonstrate that the best performing models are actually based on hybrid sets having a significant proportion of linguistic features. Furthermore, we show that linguistic information can produce consistently better quality estimates for specific score intervals. Finally, we analyse many factors that may have an impact on the performance of linguistic features and suggest new directions to mitigate them in the future.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linguistic Features for Quality Estimation

This paper describes a study on the contribution of linguistically-informed features to the task of quality estimation for machine translation at sentence level. A standard regression algorithm is used to build models using a combination of linguistic and non-linguistic features extracted from the input text and its machine translation. Experiments with EnglishSpanish translations show that lin...

متن کامل

Predicting Machine Translation Adequacy

As Machine Translation (MT) becomes more popular among end-users, an increasingly relevant issue is that of estimating the quality of automatic translations for a particular task. The main application for such quality estimates has been selecting good enough translations for human post-editing. The endusers, in this case, are fluent speakers of both source and target languages and the quality e...

متن کامل

MT Quality Estimation for E-Commerce Data

In this paper we present a system that automatically estimates the quality of machine translated segments of e-commerce data without relying on reference translations. Such approach can be used to estimate the quality of machine translated text in scenarios in which references are not available. Quality estimation (QE) can be applied to select translations to be postedited, choose the best tran...

متن کامل

The role of artificially generated negative data for quality estimation of machine translation

The modelling of natural language tasks using data-driven methods is often hindered by the problem of insufficient naturally occurring examples of certain linguistic constructs. The task we address in this paper – quality estimation (QE) of machine translation – suffers from lack of negative examples at training time, i.e., examples of low quality translation. We propose various ways to artific...

متن کامل

Okapi+QuEst: Translation Quality Estimation within Okapi

Due to the ever growing applicability of machine translation, estimating the quality of translations automatically has become a necessary task in various scenarios, for example, when deciding whether a machine translation is good enough for human post-editing. This demonstration presents the outcome of a collaborative project between the University of Sheffield and ENLASO, funded by EAMT, the E...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012